AITopics | temporal attention

In the field of multi-task reinforcement learning, the modular principle, which involves specializing functionalities into different modules and combining them appropriately, has been widely adopted as a promising approach to prevent the negative transfer problem that performance degradation due to conflicts between tasks. However, most of the existing multi-task RL methods only combine shared modules at the task level, ignoring that there may be conflicts within the task. In addition, these methods do not take into account that without constraints, some modules may learn similar functions, resulting in restricting the model's expressiveness and generalization capability of modular methods.In this paper, we propose the Contrastive Modules with Temporal Attention(CMTA) method to address these limitations. CMTA constrains the modules to be different from each other by contrastive learning and combining shared modules at a finer granularity than the task level with temporal attention, alleviating the negative transfer within the task and improving the generalization ability and the performance for multi-task RL.We conducted the experiment on Meta-World, a multi-task RL benchmark containing various robotics manipulation tasks. Experimental results show that CMTA outperforms learning each task individually for the first time and achieves substantial performance improvements over the baselines.

contrastive module, multi-task reinforcement learning, temporal attention, (4 more...)

Neural Information Processing Systems

Genre: Research Report (0.59)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.79)
Information Technology > Artificial Intelligence > Robots (0.59)

Add feedback

MDAS-GNN: Multi-Dimensional Spatiotemporal GNN with Spatial Diffusion for Urban Traffic Risk Forecasting

Gao, Ziyuan

arXiv.org Artificial IntelligenceNov-3-2025

Traffic accidents represent a critical public health challenge, claiming over 1.35 million lives annually worldwide. Traditional accident prediction models treat road segments independently, failing to capture complex spatial relationships and temporal dependencies in urban transportation networks. This study develops MDAS-GNN, a Multi-Dimensional Attention-based Spatial-diffusion Graph Neural Network integrating three core risk dimensions: traffic safety, infrastructure, and environmental risk. The framework employs feature-specific spatial diffusion mechanisms and multi-head temporal attention to capture dependencies across different time horizons. Evaluated on UK Department for Transport accident data across Central London, South Manchester, and SE Birmingham, MDASGNN achieves superior performance compared to established baseline methods. The model maintains consistently low prediction errors across short, medium, and long-term periods, with particular strength in long-term forecasting. Ablation studies confirm that integrated multi-dimensional features outperform singlefeature approaches, reducing prediction errors by up to 40%. This framework provides civil engineers and urban planners with advanced predictive capabilities for transportation infrastructure design, enabling data-driven decisions for road network optimization, infrastructure resource improvements, and strategic safety interventions in urban development projects.

artificial intelligence, machine learning, prediction, (15 more...)

arXiv.org Artificial Intelligence

2510.27197

Country: Europe > United Kingdom (0.88)

Genre: Research Report (0.50)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Modeling & Simulation (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

TRAVL: A Recipe for Making Video-Language Models Better Judges of Physics Implausibility

Motamed, Saman, Chen, Minghao, Van Gool, Luc, Laina, Iro

arXiv.org Artificial IntelligenceOct-10-2025

Despite impressive visual fidelity, modern video generative models frequently produce sequences that violate intuitive physical laws, such as objects floating, teleporting, or morphing in ways that defy causality. While humans can easily detect such implausibilities, there remains no robust method for quantitatively assessing physical realism in video. In this work, we explore whether Video-Language Models (VLMs) can be trained to serve as reliable judges of physical plausibility. We find that existing VLMs struggle to identify physics violations, exposing fundamental limitations in their temporal and causal reasoning. To address this, we introduce TRAVL, a fine-tuning recipe that combines a balanced training dataset with a trajectory-aware attention module to improve motion encoding and discrimination in VLMs. To evaluate physical reasoning more rigorously, we propose ImplausiBench, a benchmark of 300 videos (150 real, 150 generated) that removes linguistic biases and isolates visual-temporal understanding. Performance is reported both with gold-standard human judgments and stricter LLM-as-judge metrics. Together, TRAVL and ImplausiBench offer a unified framework for probing and improving physical plausibility in multimodal models, shedding light on a challenging and underexplored aspect of visual-temporal understanding.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2510.0755

Country: Europe (0.28)

Genre: Research Report (0.64)

Industry: Education (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

5820ad65b1c27411417ae8b59433e580-Paper-Conference.pdf

Neural Information Processing SystemsAug-15-2025, 00:11:36 GMT

information, mutual information, representation, (16 more...)

Neural Information Processing Systems

Country:

Asia (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Natural Language (0.68)

Add feedback

An Interpretable Transformer-Based Foundation Model for Cross-Procedural Skill Assessment Using Raw fNIRS Signals

Subedi, A., De, S., Cavuoto, L., Schwaitzberg, S., Hackett, M., Norfleet, J.

arXiv.org Artificial IntelligenceJul-1-2025

Current methods largely rely on subjective observation and rating, which introduce variability and bias, lack physiological grounding, and do not generalize across tasks or individuals . While functional near - infrared spectroscopy (fNIRS) has emerged as a promising non - invasive tool for measuring neural activity during task performance [Nemani et al., 2018; Gao et al., 2021; Keles et al., 2021], prior models -- including our own -- have largely relied on extensive preprocessing pipelines, lacked physiological interpretability, and demonstrated limited generalization across tasks or stress conditions . Recent sub - region - based analyses have shown that localized activation patterns in prefrontal and motor - related areas can distinguish between proficient and non - proficient performers [Nemani et al., 2018, Gao et al., 2021, Subedi et al., 2025]. The complex spatiotemporal dynamics captured in these signals contain rich information about cognitive and motor processes that could enable more robust, generalizable assessments across diverse settings. However, existing models tend to be task - specific and require retraining when applied to new procedures, impeding scalability . These limitations highlight the need for a domain - specific foundation model -- an architecture trained to extract reusable neural representations across various tasks, individuals, and conditions. In contrast to conventional pipelines, such a model would support modular adaptation to new contexts using minimal data, while retaining interpretability. Foundation models, initially developed in natural language processing and vision [ Lu et.

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2506.22476

Country:

North America > United States > New York (0.04)
Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)
Information Technology > Artificial Intelligence > Cognitive Science > Neuroscience (0.68)

Add feedback

Lightweight Spatio-Temporal Attention Network with Graph Embedding and Rotational Position Encoding for Traffic Forecasting

Wang, Xiao, Yang, Shun-Ren

arXiv.org Artificial IntelligenceMay-20-2025

Traffic forecasting is a key task in the field of Intelligent Transportation Systems. Recent research on traffic forecasting has mainly focused on combining graph neural networks (GNNs) with other models. However, GNNs only consider short-range spatial information. In this study, we present a novel model termed LSTAN-GERPE (Lightweight Spatio-Temporal Attention Network with Graph Embedding and Rotational Position Encoding). This model leverages both Temporal and Spatial Attention mechanisms to effectively capture long-range traffic dynamics. Additionally, the optimal frequency for rotational position encoding is determined through a grid search approach in both the spatial and temporal attention mechanisms. This systematic optimization enables the model to effectively capture complex traffic patterns. The model also enhances feature representation by incorporating geographical location maps into the spatio-temporal embeddings. Without extensive feature engineering, the proposed method in this paper achieves advanced accuracy on the real-world traffic forecasting datasets PeMS04 and PeMS08.

artificial intelligence, forecasting, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2505.12136

Country:

North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
North America > Canada > British Columbia > Vancouver (0.04)
Asia > Taiwan (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Transportation > Infrastructure & Services (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Contrastive Modules with Temporal Attention for Multi-Task Reinforcement Learning

Neural Information Processing SystemsJan-19-2025, 06:51:56 GMT

In the field of multi-task reinforcement learning, the modular principle, which involves specializing functionalities into different modules and combining them appropriately, has been widely adopted as a promising approach to prevent the negative transfer problem that performance degradation due to conflicts between tasks. However, most of the existing multi-task RL methods only combine shared modules at the task level, ignoring that there may be conflicts within the task. In addition, these methods do not take into account that without constraints, some modules may learn similar functions, resulting in restricting the model's expressiveness and generalization capability of modular methods.In this paper, we propose the Contrastive Modules with Temporal Attention(CMTA) method to address these limitations. CMTA constrains the modules to be different from each other by contrastive learning and combining shared modules at a finer granularity than the task level with temporal attention, alleviating the negative transfer within the task and improving the generalization ability and the performance for multi-task RL.We conducted the experiment on Meta-World, a multi-task RL benchmark containing various robotics manipulation tasks. Experimental results show that CMTA outperforms learning each task individually for the first time and achieves substantial performance improvements over the baselines.

contrastive module, multi-task reinforcement learning, temporal attention, (3 more...)

Neural Information Processing Systems

Genre: Research Report (0.62)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.64)

Add feedback

Enhancing Multi-Text Long Video Generation Consistency without Tuning: Time-Frequency Analysis, Prompt Alignment, and Theory

Li, Xingyao, Zhang, Fengzhuo, Pan, Jiachun, Hou, Yunlong, Tan, Vincent Y. F., Yang, Zhuoran

arXiv.org Artificial IntelligenceDec-22-2024

Despite the considerable progress achieved in the long video generation problem, there is still significant room to improve the consistency of the videos, particularly in terms of smoothness and transitions between scenes. We address these issues to enhance the consistency and coherence of videos generated with either single or multiple prompts. We propose the Time-frequency based temporal Attention Reweighting Algorithm (TiARA), which meticulously edits the attention score matrix based on the Discrete Short-Time Fourier Transform. Our method is supported by a theoretical guarantee, the first-of-its-kind for frequency-based methods in diffusion models. For videos generated by multiple prompts, we further investigate key factors affecting prompt interpolation quality and propose PromptBlend, an advanced prompt interpolation pipeline. The efficacy of our proposed method is validated via extensive experimental results, exhibiting consistent and impressive improvements over baseline methods. The code will be released upon acceptance.

high quality, video, video generation, (14 more...)

arXiv.org Artificial Intelligence

2412.17254

Country:

South America > Argentina > Pampas > Buenos Aires F.D. > Buenos Aires (0.04)
Europe > Czechia > Prague (0.04)
Asia > Singapore (0.04)
Antarctica (0.04)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Add feedback

Filters

Collaborating Authors

temporal attention

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Space-timeMixingAttentionforVideoTransformer

22785dd2577be2ce28ef79febe80db10-Supplemental.pdf

Contrastive Modules with Temporal Attention for Multi-Task Reinforcement Learning

MDAS-GNN: Multi-Dimensional Spatiotemporal GNN with Spatial Diffusion for Urban Traffic Risk Forecasting

TRAVL: A Recipe for Making Video-Language Models Better Judges of Physics Implausibility

5820ad65b1c27411417ae8b59433e580-Paper-Conference.pdf

An Interpretable Transformer-Based Foundation Model for Cross-Procedural Skill Assessment Using Raw fNIRS Signals

Lightweight Spatio-Temporal Attention Network with Graph Embedding and Rotational Position Encoding for Traffic Forecasting

Contrastive Modules with Temporal Attention for Multi-Task Reinforcement Learning

Enhancing Multi-Text Long Video Generation Consistency without Tuning: Time-Frequency Analysis, Prompt Alignment, and Theory